Goto

Collaborating Authors

 human control


Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications

Simmons-Edler, Riley, Dong, Jean, Lushenko, Paul, Rajan, Kanaka, Badman, Ryan P.

arXiv.org Artificial Intelligence

Military weapon systems and command-and-control infrastructure augmented by artificial intelligence (AI) have seen rapid development and deployment in recent years. However, the sociotechnical impacts of AI on combat systems, military decision-making, and the norms of warfare have been understudied. We focus on a specific subset of lethal autonomous weapon systems (LAWS) that use AI for targeting or battlefield decisions. We refer to this subset as AI-powered lethal autonomous weapon systems (AI-LAWS) and argue that they introduce novel risks -- including unanticipated escalation, poor reliability in unfamiliar environments, and erosion of human oversight -- all of which threaten both military effectiveness and the openness of AI research. These risks cannot be addressed by high-level policy alone; effective regulation must be grounded in the technical behavior of AI models. We argue that AI researchers must be involved throughout the regulatory lifecycle. Thus, we propose a clear, behavior-based definition of AI-LAWS -- systems that introduce unique risks through their use of modern AI -- as a foundation for technically grounded regulation, given that existing frameworks do not distinguish them from conventional LAWS. Using this definition, we propose several technically-informed policy directions and invite greater participation from the AI research community in military AI policy discussions.


Limits of Safe AI Deployment: Differentiating Oversight and Control

Manheim, David, Homewood, Aidan

arXiv.org Artificial Intelligence

Oversight and control, which we collectively call supervision, are often discussed as ways to ensure that AI systems are accountable, reliable, and able to fulfill governance and management requirements. However, the requirements for "human oversight" risk codifying vague or inconsistent interpretations of key concepts like oversight and control. This ambiguous terminology could undermine efforts to design or evaluate systems that must operate under meaningful human supervision. This matters because the term is used by regulatory texts such as the EU AI Act. This paper undertakes a targeted critical review of literature on supervision outside of AI, along with a brief summary of past work on the topic related to AI. We next differentiate control as ex-ante or real-time and operational rather than policy or governance, and oversight as performed ex-post, or a policy and governance function. Control aims to prevent failures, while oversight focuses on detection, remediation, or incentives for future prevention. Building on this, we make three contributions. 1) We propose a framework to align regulatory expectations with what is technically and organizationally plausible, articulating the conditions under which each mechanism is possible, where they fall short, and what is required to make them meaningful in practice. 2) We outline how supervision methods should be documented and integrated into risk management, and drawing on the Microsoft Responsible AI Maturity Model, we outline a maturity model for AI supervision. 3) We explicitly highlight boundaries of these mechanisms, including where they apply, where they fail, and where it is clear that no existing methods suffice. This foregrounds the question of whether meaningful supervision is possible in a given deployment context, and can support regulators, auditors, and practitioners in identifying both present and future limitations.


Reduced AI Acceptance After the Generative AI Boom: Evidence From a Two-Wave Survey Study

Baumann, Joachim, Urman, Aleksandra, Leicht-Deobald, Ulrich, Roman, Zachary J., Hannák, Anikó, Christen, Markus

arXiv.org Artificial Intelligence

The rapid adoption of generative artificial intelligence (GenAI) technologies has led many organizations to integrate AI into their products and services, often without considering user preferences. Yet, public attitudes toward AI use, especially in impactful decision-making scenarios, are underexplored. Using a large-scale two-wave survey study (n_wave1=1514, n_wave2=1488) representative of the Swiss population, we examine shifts in public attitudes toward AI before and after the launch of ChatGPT. We find that the GenAI boom is significantly associated with reduced public acceptance of AI (see Figure 1) and increased demand for human oversight in various decision-making contexts. The proportion of respondents finding AI "not acceptable at all" increased from 23% to 30%, while support for human-only decision-making rose from 18% to 26%. These shifts have amplified existing social inequalities in terms of widened educational, linguistic, and gender gaps post-boom. Our findings challenge industry assumptions about public readiness for AI deployment and highlight the critical importance of aligning technological development with evolving public preferences.


Red Lines and Grey Zones in the Fog of War: Benchmarking Legal Risk, Moral Harm, and Regional Bias in Large Language Model Military Decision-Making

Drinkall, Toby

arXiv.org Artificial Intelligence

As military organisations consider integrating large language models (LLMs) into command and control (C2) systems for planning and decision support, understanding their behavioural tendencies is critical. This study develops a benchmarking framework for evaluating aspects of legal and moral risk in targeting behaviour by comparing LLMs acting as agents in multi-turn simulated conflict. We introduce four metrics grounded in International Humanitarian Law (IHL) and military doctrine: Civilian Target Rate (CTR) and Dual-use Target Rate (DTR) assess compliance with legal targeting principles, while Mean and Max Simulated Non-combatant Casualty Value (SNCV) quantify tolerance for civilian harm. We evaluate three frontier models, GPT-4o, Gemini-2.5, and LLaMA-3.1, through 90 multi-agent, multi-turn crisis simulations across three geographic regions. Our findings reveal that off-the-shelf LLMs exhibit concerning and unpredictable targeting behaviour in simulated conflict environments. All models violated the IHL principle of distinction by targeting civilian objects, with breach rates ranging from 16.7% to 66.7%. Harm tolerance escalated through crisis simulations with MeanSNCV increasing from 16.5 in early turns to 27.7 in late turns. Significant inter-model variation emerged: LLaMA-3.1 selected an average of 3.47 civilian strikes per simulation with MeanSNCV of 28.4, while Gemini-2.5 selected 0.90 civilian strikes with MeanSNCV of 17.6. These differences indicate that model selection for deployment constitutes a choice about acceptable legal and moral risk profiles in military operations. This work seeks to provide a proof-of-concept of potential behavioural risks that could emerge from the use of LLMs in Decision Support Systems (AI DSS) as well as a reproducible benchmarking framework with interpretable metrics for standardising pre-deployment testing.


AI Must not be Fully Autonomous

Adewumi, Tosin, Alkhaled, Lama, Imbert, Florent, Han, Hui, Habib, Nudrat, Löwenmark, Karl

arXiv.org Artificial Intelligence

Autonomous Artificial Intelligence (AI) has many benefits. It also has many risks. In this work, we identify the 3 levels of autonomous AI. We are of the position that AI must not be fully autonomous because of the many risks, especially as artificial superintelligence (ASI) is speculated to be just decades away. Fully autonomous AI, which can develop its own objectives, is at level 3 and without responsible human oversight. However, responsible human oversight is crucial for mitigating the risks. To ague for our position, we discuss theories of autonomy, AI and agents. Then, we offer 12 distinct arguments and 6 counterarguments with rebuttals to the counterarguments. We also present 15 pieces of recent evidence of AI misaligned values and other risks in the appendix.


Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models

Potham, Ram, Harms, Max

arXiv.org Artificial Intelligence

Foundation models (FMs) face a critical safety challenge: as capabilities scale, instrumental convergence drives default trajectories toward loss of human control, potentially culminating in existential catastrophe. Current alignment approaches struggle with value specification complexity and fail to address emergent power-seeking behaviors. We propose "Corrigibility as a Singular Target" (CAST)-designing FMs whose overriding objective is empowering designated human principals to guide, correct, and control them. This paradigm shift from static value-loading to dynamic human empowerment transforms instrumental drives: self-preservation serves only to maintain the principal's control; goal modification becomes facilitating principal guidance. We present a comprehensive empirical research agenda spanning training methodologies (RLAIF, SFT, synthetic data generation), scalability testing across model sizes, and demonstrations of controlled instructability. Our vision: FMs that become increasingly responsive to human guidance as capabilities grow, offering a path to beneficial AI that remains as tool-like as possible, rather than supplanting human judgment. This addresses the core alignment problem at its source, preventing the default trajectory toward misaligned instrumental convergence.


AI firms warned to calculate threat of super intelligence or risk it escaping human control

The Guardian

Artificial intelligence companies have been urged to replicate the safety calculations that underpinned Robert Oppenheimer's first nuclear test before they release all-powerful systems. Max Tegmark, a leading voice in AI safety, said he had carried out calculations akin to those of the US physicist Arthur Compton before the Trinity test and had found a 90% probability that a highly advanced AI would pose an existential threat. The US government went ahead with Trinity in 1945, after being reassured there was a vanishingly small chance of an atomic bomb igniting the atmosphere and endangering humanity. In a paper published by Tegmark and three of his students at the Massachusetts Institute of Technology (MIT), they recommend calculating the "Compton constant" – defined in the paper as the probability that an all-powerful AI escapes human control. In a 1959 interview with the US writer Pearl Buck, Compton said he had approved the test after calculating the odds of a runaway fusion reaction to be "slightly less" than one in three million.


US feds say AI-generated prompt outputs can't be copyrighted

PCWorld

If you use an AI image or text generator to make a work of "art," does it belong to you? That's a huge question hanging over the heads of anyone tempted to use AI tools for commercial products. Crucially, simply plugging prompts into an AI image generator or text generator does NOT meet this burden. Because the author (or artist, or other relevant creative term) of a work is defined as "the person who translates an idea into a fixed, tangible expression," an AI system cannot meet this burden, even though it's using input from a human to generate its output. Commenting on established case law, the report says that "…the Supreme Court has made clear that originality is required, not just time and effort."


Why handing over total control to AI agents would be a huge mistake

MIT Technology Review

These developments mark a major advance in artificial intelligence: systems designed to operate in the digital world without direct human oversight. Who doesn't want assistance with cumbersome work or tasks there's no time for? Agent assistance could soon take many different forms, such as reminding you to ask a colleague about their kid's basketball tournament or finding images for your next presentation. Within a few weeks, they'll probably be able to make presentations for you. For people with hand mobility issues or low vision, agents could complete tasks online in response to simple language commands.


Inside France's Effort to Shape the Global AI Conversation

TIME - Tech

One evening early last year, Anne Bouverot was putting the finishing touches on a report when she received an urgent phone call. It was one of French President Emmanuel Macron's aides offering her the role as his special envoy on artificial intelligence. The unpaid position would entail leading the preparations for the France AI Action Summit--a gathering where heads of state, technology CEOs, and civil society representatives will seek to chart a course for AI's future. Set to take place on Feb. 10 and 11 at the presidential Élysée Palace in Paris, it will be the first such gathering since the virtual Seoul AI Summit in May--and the first in-person meeting since November 2023, when world leaders descended on Bletchley Park for the U.K.'s inaugural AI Safety Summit. After weighing the offer, Bouverot, who was at the time the co-chair of France's AI Commission, accepted. But France's Summit won't be like the others.